Machine-implemented method for obtaining data from a nonlinear dynamic real system during a test run

ABSTRACT

A method for obtaining data from a nonlinear dynamic real system during a test run, for instance an internal combustion engine, a drivetrain, or parts thereof, a sequence of dynamic excitation signals is generated according to an initial design and a system output is measured. To enable the quick and precise generation of experimental designs for global measurement, modeling, and optimization of a nonlinear dynamic real system, a sequence of dynamic excitation signals was generated by generating a design with a sequence of excitation signals, obtaining output data by feeding said sequence into a model for the real system, determining a criterion for the information content of the complete design of experiment sequence, varying the sequence, obtaining new output data by feeding said modified sequence into the model, determining again said criterion, iterating until said criterion improves, and using the improved sequence of excitation signals for the real system.

TECHNICAL FIELD

The present disclosure is related to a machine-implemented method for obtaining data from a nonlinear dynamic real system during a test run, for instance an internal combustion engine, a drive-train or parts thereof.

BACKGROUND

There is an ever-increasing need in the automotive field for efficient and accurate models since the calibration of the motor control system is becoming increasingly complex, and also increasingly expensive due to stricter and stricter regulatory requirements. The principal requirements for good models are good measurement data and an appropriately selected measurement design. As a result, the number of measurements and thus also the measurement period increases. However, since time on the test stand is very expensive, the need arises for effective experimental designs that minimize the number of measurement points, cover the test space as effectively as possible, while at the same time not qualitatively degrading the models trained using these data. These models are then used to optimize and calibrate ECU structures, or also to make decisions regarding components.

Optimal experiment design (OED) optimizes the information content, which is demanded to properly parametrize a model with as little effort as possible, as explained in L. Pronzato “Optimal experimental design and some related control problems.” Automatica, 44(2):303-325, 2008. Dynamic excitation signals are characterized by their spacial distribution and their temporal behaviour. For the identification of linear dynamic systems pseudo random binary signals (PRBS) are commonly used, see G. C. Goodwin and R. L. Payne “Dynamic System Identification: Experiment Design and Data Analysis” Academic Press Inc., New York, 1977. When it comes to nonlinear dynamic systems amplitude modulated pseudo random binary signals (APRBS) are well established as excitation signals, in order to track the nonlinear process characteristics, see e.g. O. Nelles “Nonlinear System Identification: From Classical Approaches to Neural Networks and Fuzzy Models” Springer, Berlin, 2001. In contrast to these very general methodologies for the experiment design, model based design of experiments (DoE) is more specifically suited to the process to be identified in that a prior process model or at least a model structure is used to maximize the information gained from experiments.

Real processes are subject to restrictions, which basically affect system inputs and outputs. E.g. the manipulated and the control variables must not exceed the feasible range, in order to provide specified operational conditions or to prevent damage from the plant. For the consideration of system output constraints in the DoE a model is required, which predicts the output dynamics. Model based DoE can also be used for online experiment design, where the model is continuously adapted to incoming data and the DoE is generated sequentially for a certain number of future system inputs. Such a procedure is commonly called online or adaptive DoE and is depicted in FIG. 1. Explanations can be found in Online Dynamic Black Box Modelling and Adaptive Experiment Design in Combustion Engine Calibration, Munich, Germany, 2010 and in László Gerncsér, Hakan Hjalmarsson and Jonas Martensson “Identification of arx systems with nonstationary inputs—asymptotic analysis with application to adaptive input design” Automatica, 45:623-633, March 2009.

SUMMARY

The purpose of the method presented here is to generate experimental designs that are matched to typical applications in engines or drive train development, and calibration of the engine control unit (ECU) or transmission control unit (TCU). The method is intended to enable the quick and precise generation of experimental designs for global measurement, modeling, and optimization of a nonlinear dynamic real system, for example, of an internal combustion engine, a drive train, or subsystems thereof, as well as the global optimization thereof while taking into account experimental limits and additional criteria.

In order to achieve this purpose, the method described in the introduction is characterized by the characterizing part of claim 1. Preferred embodiments of that basic concept are given in the dependent claims.

The main advantages of model based DoE in contrast to conventional DoE techniques are that the model is used to optimize the DoE such that the experiments are as effective as possible, the incorporation of various constraints, including constraints on the system output, and the application in an online DoE is possible.

The invention is explained in more detail in the following specification, based on preferred examples and relating to the attached drawing figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an offline and online DoE procedure,

FIG. 2 shows a two layer feedforward perceptron with three inputs, two hidden units and one output,

FIG. 3 depicts a NNOE structure with one single input u and deadtime d,

FIG. 4 shows a nonlinear constrained optimization with inequality constraint for a two dimensional input signal u,

FIG. 5 depicts the excitation signal u and model output ^(^)y with output limit violation ^(^)y(k)>ymax,

FIG. 6 shows the structure of the Wiener model,

FIG. 7 is an illustration of the excitation signal u, the model output ^(^)y and the input rate_urate with the associated constraints (plotted in dashed lines) for the Initial Design (I), the First Valid Design (II) (all constraints are complied for the first time) and the Final Design (III), whereby the Initial Design exceeds the feasible range (plotted in dashed lines) of the model output, and

FIG. 8 shows the iterative augmentation of log(det(I)) including the three stages of FIG. 7: Initial Design (I), First Valid Design (II) and Final Design (III).

DETAILED DESCRIPTION

In the following specification it is explained, how to elaborate an optimal DoE batch procedure with compliance to input and output constraints using a dynamic multilayer perceptron (MLP) network for the representation of the real nonlinear system under consideration, i.e. the combustion engine, drive train or the like that is to be analysed in a test run e.g. on a test stand. It seeks to optimize the Fisher information matrix of a MLP and is thus applicable to a wide class of dynamic systems. In literature, several methods for OED based on neural networks are proposed, which make use of candidate sets for the optimization of excitation signals. In D. A. Cohn “Neural network exploration using optimal experiment design” Neural Networks, 9(6):1071-1083, 1996 OED is applied to learning problems by allowing the learner at each time step to select a new system input from a candidate set. The selection of the new input is done with consideration of either static or dynamic input constraints by means of minimizing the expectation value of the learners mean squared error. It is further known that the D-optimality criterion is applied for the reduction of training data for global dynamic models based on dynamic neural networks. Also a locally sequential D-optimal DoE was proposed and formulated as a model predictive control problem by means of optimization over a set of feasible system inputs, which are based on the temporal progression of a given APRBS. The use of MLP networks for model based DoE is inspired by the above mentioned publications and combined with a gradient algorithm for the enhancement of excitation signals. As opposed to most state of the art approaches in this paper the optimization of the design criterion is carried out analytically so that no candidate set is required. The simultaneous optimization of both the temporal and spatial progression of the excitation signal and the compliance to input and output constraints is proposed. In this context dynamic model based DoE has to take into account the inherent effect of every excitation signal on all future system outputs so that its influence on the Fisher information matrix as well as the adherence to output constraints can become very sophisticated.

As the nonlinear model structure, MLP networks are chosen as nonlinear model structure, since they belong to the class of universal approximators. Another nonlinear model structure for use according to the present invention are Local Model Networks (LMN) or Takagi-Sugeno fuzzy models.

More specifically, multiple input single output (MISO) MLP networks are used, as shown in FIG. 2. The free model parameters are given by the input weights W and the output weights ω, which can be combined to the parameter vector θ.

$\begin{matrix} {W = \begin{bmatrix} W_{10} & \ldots & W_{1\; n_{\varphi}} \\ \vdots & \ddots & \vdots \\ W_{n_{h}0} & \ldots & W_{n_{h}n_{\varphi}} \end{bmatrix}} & (1) \\ {\omega = \begin{bmatrix} \omega_{10} & \ldots & \omega_{1\; n_{h}} \end{bmatrix}^{T}} & (2) \\ {\theta = \begin{bmatrix} {\omega_{10}\mspace{14mu}\ldots\mspace{14mu}\omega_{1\; n_{h}}} & {W_{10\mspace{14mu}}\ldots\mspace{14mu} W_{1\; n_{\varphi}}\mspace{14mu}\ldots\mspace{14mu} W_{n_{h}0}\mspace{14mu}\ldots\mspace{14mu} W_{n_{h}n_{\varphi}}} \end{bmatrix}^{T}} & (3) \end{matrix}$ The model output ^(^)y is calculated as a weighted sum of nonlinear sigmoid activation functions f_(i) in the hidden layer. There are nφ model inputs φi, which constitute the regression vector φ.

$\begin{matrix} {\mspace{79mu}{{{f_{i}\left( h_{i} \right)} = \frac{1}{1 +}}\mspace{79mu}{\varphi = \left\lbrack {\varphi_{1}\mspace{14mu}\ldots\mspace{14mu}\varphi_{n_{\varphi}}} \right\rbrack}}} & (4) \end{matrix}$ The approximation capability of the MLP network is determined by the number of hidden units n_(h). In vector matrix notation the input to the hidden layer h and the output of the hidden layer o are given by:

$\begin{matrix} {{h = {W\begin{bmatrix} 1 \\ \varphi^{T} \end{bmatrix}}}{{o(h)} = \begin{bmatrix} {f_{1}\left( h_{1} \right)} \\ \vdots \\ {f_{n_{h}}\left( h_{n_{h}} \right)} \end{bmatrix}}} & (5) \end{matrix}$ The output of the MLP network is a nonlinear function g(φ,θ), which depends on the regression vector φ and parameter vector θ.

$\begin{matrix} \begin{matrix} {\hat{y} = {g\left( {\varphi,\theta} \right)}} \\ {= {{\omega^{T}\begin{bmatrix} 1 \\ {o(h)} \end{bmatrix}} =}} \\ {= {{\sum\limits_{j = 1}^{n_{h}}\;{\omega_{1j}{f_{j}\left( {{\sum\limits_{l = 1}^{n_{\varphi}}\; W_{j\; l\;\varphi\; l}} + W_{{j0}\;}} \right)}}} + \omega_{10}}} \end{matrix} & (6) \end{matrix}$ Both the inputs to the hidden units and the output of the hidden layer contain the offset terms W_(j0) and ω₁₀. It is assumed that the system output of the real process y(k) is given by the model ^(^)y(k,θ) and some Gaussian error e(k) with zero mean and variance σ2: y(k)=g(φ(k),θ)+e(k)={circumflex over (y)}(k,θ)+e(k)  (7)

So far it is not specified whether the MLP network is used for static or dynamic systems. For the following, dynamic MLP networks in the output error (NNOE) configuration are considered. Then, the regression vector φ(k,θ) at the k-th observation comprises of past network outputs ^(^)y(k−i) with i=1 . . . n and past system inputs u(k−d−j+1) with j=1 . . . m and dead time d (see FIG. 3). φ(k,θ)=[{circumflex over (y)}(k−1,θ) . . . {circumflex over (y)}(k−n,θ)u ₁(k−d) . . . u ₁(k−d−m) . . . u _(n) _(u) (k−d) . . . u _(n) _(u) (k−d−m)]  (8) It is assumed that the excitation signal U consists of N observations and n_(u) different inputs. The measured system output y is given by:

$\begin{matrix} {U = {\left\lbrack {u_{1}\mspace{14mu}\ldots\mspace{14mu} u_{n_{u}}} \right\rbrack = \begin{bmatrix} {u_{1}(1)} & \ldots & {u_{n_{u}}(1)} \\ \vdots & \ddots & \vdots \\ {u_{1}(N)} & \ldots & {u_{n_{u}}(N)} \end{bmatrix}}} & (9) \end{matrix}$ The measured system output y is given by:

$\begin{matrix} {y = \begin{bmatrix} {y(1)} \\ \vdots \\ {y(N)} \end{bmatrix}} & (10) \end{matrix}$

The network parameters θ have to be adjusted to the experimental input and output data. The training of the network weights is usually carried out with a standard Levenberg Marquardt algorithm, which is targeted to minimize a quadratic cost function V_(N)(θ) based on the prediction error ε(k):

$\begin{matrix} {{V_{N}(\theta)} = {{\frac{1}{2N}{\sum\limits_{k = 1}^{N}\;\left( {{y(k)} - {\hat{y}\left( {k,\theta} \right)}} \right)^{2}}} = {\frac{1}{2N}{\sum\limits_{k = 1}^{N}\;{\varepsilon^{2}\left( {k,\theta} \right)}}}}} & (11) \end{matrix}$

Optimal model based DoE is targeted to maximize the information content of experiments. For this purpose, an optimality criterion is optimized, which is usually derived from the Fisher information matrix. In this context, the process model in the form of a MLP network is necessary for the calculation of the Fisher information matrix. From a statistical point of view, the Fisher information matrix I gives a statement about the information content of data, in terms of the covariance of estimated parameters. Popular design criteria for model based DoE are the trace of I⁻¹ (A-optimality), the determinant of I (D-optimality) and the smallest eigenvalue of I (E-optimality).

Model based DoE can be applied if a model of a system already exists and only some parts of the system have been modified so that a similar behavior can be expected, if the system is operated in changed environmental conditions and if a model of a similar system is available.

The Fisher matrix is a statistical measure of the amount of information of the underlying data and its inverse gives the covariance matrix of the estimated model parameters θ. This matrix is the derivative of the model output with respect to the model parameters:

$I = {\frac{1}{\sigma^{2}}{\sum\limits_{k = 1}^{N}\;{\frac{\partial{\hat{y}(k)}}{\partial\theta}\frac{\partial{\hat{y}(k)}^{T}}{\partial\theta}}}}$

Popular design criteria for model based DoE use a scalar value of the Fisher matrix or of its inverse as objective function. In doing so, a measure for the information content is needed, in order to be able to optimize excitation signals. Since the Fisher information matrix gives statistical evidence about the information content of the underlying data and the covariance of the estimated parameters, respectively, it is a basis for common design criteria like A-, D- and E-optimality.

The Fisher information matrix depends on the parameter sensitivity vector ψ(k), which describes the functional dependence of the model output on the model parameters:

$\begin{matrix} {{\psi(k)} = \frac{\partial{\hat{y}\left( {{\varphi\left( {k,\theta} \right)},\theta} \right)}}{\partial\theta}} & (12) \end{matrix}$

The Fisher information matrix comprises the parameter sensitivity vectors of all observations k=1 . . . N according to

$\begin{matrix} {{{\mathcal{I}(\Psi)} = {\frac{1}{\sigma^{2}}\Psi^{T}\Psi}},} & (13) \end{matrix}$ where the parameter sensitivity matrix W combines the parameter sensitivities for all observations:

$\begin{matrix} {\Psi = \begin{bmatrix} {\psi^{T}(1)} \\ \vdots \\ {\psi^{T}(N)} \end{bmatrix}} & (14) \end{matrix}$

The parameter sensitivity is separately obtained for the MLP output weights ψ_(ω)(k) and the input weights ψ_(W)(k):

$\begin{matrix} {{\psi_{\omega}(k)} = {\frac{\partial{\hat{y}\left( {{\varphi\left( {k,\theta} \right)},\theta} \right)}}{\partial\omega} = \begin{bmatrix} 1 \\ {o\left( {h(k)} \right)} \end{bmatrix}}} & (15) \\ {{\psi_{\omega}(k)}\mspace{14mu} \in {\mathbb{R}}^{{\lbrack{1 + n_{h}}\rbrack}\; \times 1}} & (16) \\ {{\Psi_{W}(k)} = {\frac{\partial{\hat{y}\left( {{\varphi\left( {k,\theta} \right)},\theta} \right)}}{\partial W} = {{diag}\mspace{14mu}\left( {o^{\prime}\left( {h(k)} \right)} \right){\overset{\sim}{\omega}\left\lbrack {1{\varphi\left( {k,\theta} \right)}} \right\rbrack}}}} & (17) \\ {{\Psi_{W}(k)}\mspace{14mu} \in {\mathbb{R}}^{n_{h} \times {\lbrack{1 + n_{\varphi}}\rbrack}}} & (18) \end{matrix}$

The output weight vector without bias term is indicated with {tilde over (ω)}:

$\begin{matrix} {\overset{\sim}{\omega} = \left\lbrack {\omega_{11}\mspace{14mu}\ldots\mspace{14mu}\omega_{1n_{h}}} \right\rbrack^{T}} & (19) \\ {{o^{\prime}\left( {h(k)} \right)} = {{\left\lbrack \frac{\partial{o_{i}\left( {h_{i}(k)} \right)}}{\partial{h_{i}(k)}} \right\rbrack\mspace{14mu} i} = {1\mspace{14mu}\ldots\mspace{14mu} n_{h}}}} & (20) \\ {{{o^{\prime}\left( {h(k)} \right)} = {1 - {{o\left( {h(k)} \right)} \cdot {o\left( {h(k)} \right)}}}},} & (21) \end{matrix}$

Here, o′(h(k) denotes a column vector, whose i-th element is given by the derivative of the i-th output of the hidden layer with respect to the i-th input to the hidden layer at the k-th observation. In (17) diag(o′(h(k)) indicates a diagonal matrix, whose entries are the elements of the vector o′(h(k)) and o denotes the Hadamard product.

Frequently used optimality criteria based on the Fisher information matrix are A-, D- and E-optimality. A-optimality is targeted to minimize the sum of the parameter variances. Therefore, the associated design criterion is based on the trace of I⁻¹: J _(A)(Ψ)=Tr(I ⁻¹(Ψ))  (22)

D-optimality uses the determinant of I, which is in contrast to A-optimality more sensitive to single parameter covariances, since the determinant equals the product of the eigenvalues. Moreover, D-optimality is invariant under any nonsingular reparametrization, which does not depend on the experiment. J _(D)(Ψ)=det(I(Ψ))  (23)

For the E-optimal design the smallest eigenvalue λmin of the Fisher information matrix is subject of maximization. J _(E)(Ψ)=λ_(min)(I(Ψ))  (24)

The design criterion J can only be influenced by changing the system inputs u_(i)(k). For the optimization of the design criterion, usually a candidate set of feasible inputs is generated, from which certain inputs are selected that optimize the design criterion.

The proposed method for model based DoE realizes the minimization of (22) and the maximization of (23) and (24) by an analytical calculation of optimized excitation signals with consideration of input and output constraints.

Generally, the enhancement of excitation signals is targeted to optimize the design criterion with simultaneous adherence to constraints. Mathematically, the optimization problem with consideration of input, input rate and output constraints is stated as:

⁢A - optimality ⁢ : ⁢ J A → u i ⁡ ( k ) ( 25 ) ⁢ D - and ⁢ ⁢ E - ptimality ⁢ : ⁢ J D , J E → ⁢ u i ⁡ ( k ) ( 26 ) ⁢ Input ⁢ ⁢ constraint ⁢ : ⁢ ⁢ u min ≤ u i ⁡ ( k ) ≤ u max ( 27 ) ⁢ Rate ⁢ ⁢ constraint ⁢ : ⁢ ⁢ Δ ⁢ ⁢ u min ≤ u i ⁡ ( k + 1 ) - u i ⁡ ( k ) ≤ Δ ⁢ ⁢ u max ( 28 ) Output ⁢ ⁢ constraint ⁢ : ⁢ ⁢ y min ≤ y ^ ⁡ ( k ) ≤ y max _ ( 29 )

The enhancement of the design criterion J poses a nonlinear optimization problem, for which different optimization methods are available. For the enhancement of the design criterion of linear dynamic systems, a gradient algorithm is known. For use in the present invention, an iterative gradient descent method is used to optimize the excitation signals of nonlinear dynamic systems. The proposed method for the analytical calculation of optimized excitation signals is done in two steps. First, the gradient of the design criterion with respect to the dynamic system inputs is determined, and second, the system inputs are updated recursively and input and output constraints are observed.

The optimization is based on the calculation of the gradient of the objective function represented by the design criterion with respect to the dynamic system inputs. In each iteration, the excitation signal is updated such that the design criterion is improved while the compliance to constraints is accomplished.

The derivative of the design criterion J with respect to the i-th system input u_(i)(k) is calculated by the use of the chain rule in three steps:

$\begin{matrix} {\frac{\mathbb{d}{J(\Psi)}}{\mathbb{d}{u_{i}(k)}} = {\sum\limits_{l = 1}^{N}\;{\underset{\underset{(i)}{︸}}{\frac{\mathbb{d}{J(\Psi)}}{\mathbb{d}{\psi^{T}(l)}}}\underset{\underset{({ii})}{︸}}{\frac{\mathbb{d}{\psi(l)}}{\mathbb{d}{\varphi\left( {l,\theta} \right)}}}\underset{\underset{({iii})}{︸}}{\frac{\mathbb{d}{\varphi^{T}\left( {l,\theta} \right)}}{\mathbb{d}{u_{i}(k)}}}}}} & (30) \end{matrix}$

Ad(i): First the derivative of the design criterion with respect to the parameter sensitivity vector for the I-th observation ψ(I) is required. For A-optimality and D-optimality the result is given by:

$\begin{matrix} {\frac{\mathbb{d}{J_{A}(\Psi)}}{\mathbb{d}{\psi^{T}(l)}} = {{- 2}{s(l)}{\Psi\left\lbrack {\Psi^{T}\Psi} \right\rbrack}^{- 2}}} & (31) \\ {\frac{\mathbb{d}{J_{D}(\Psi)}}{\mathbb{d}{\psi^{T}(l)}} = {2{s(l)}{J_{D}(\Psi)}{\Psi\left\lbrack {\Psi^{T}\Psi} \right\rbrack}^{- 1}}} & (32) \\ {{s(l)} = {\left\lbrack {0\mspace{14mu}\ldots\mspace{14mu} 1\mspace{14mu}\ldots\mspace{14mu} 0} \right\rbrack\mspace{14mu} \in {\mathbb{R}}^{1 \times N}}} & (33) \end{matrix}$ Where s(I) represents the single entry vector, which equals 1 at the I-th position and 0 elsewhere. E-optimality requires the calculation of the derivative of the smallest eigenvalue of the Fisher matrix λmin with respect to the parameter sensitivity vector ψ(I), which results in:

$\begin{matrix} {\frac{\mathbb{d}{J_{E}(\Psi)}}{\mathbb{d}{\psi(l)}^{T}} = {2{s(l)}\Psi\; x_{\min}{x_{\min}}^{T}}} & (34) \end{matrix}$

Here, x_(min) indicates the eigenvector of the smallest eigenvalue λmin: Ix _(min)=λ_(min) x _(min)  (35)

In (31) and (32) the inverse of the Fisher matrix appears. Therefore, the Fisher matrix must be regular, in order to be invertible. It was already shown that a singular Fisher information matrix based on a MLP network can be made regular by the elimination of redundant neurons.

Ad(ii): The derivative of the parameter sensitivity vector for the output weights ψ_(ω)(I) with respect to the regression vector φ(I,θ) is given by:

$\begin{matrix} {\frac{\mathbb{d}{\psi_{w}(l)}}{\mathbb{d}{\varphi\left( {l,\theta} \right)}} = {\begin{bmatrix} 0 \\ {{{diag}\left( {o^{\prime}\left( {h(l)} \right)} \right)}\overset{\sim}{W}} \end{bmatrix} \in {\mathbb{R}}^{{\lbrack{1 + n_{h}}\rbrack} \times n_{\varphi}}}} & (36) \end{matrix}$ Here, the input weights without bias terms is indicated by ^(˜)W:

$\begin{matrix} {\overset{\sim}{W} = \begin{bmatrix} W_{11} & \ldots & W_{1n_{\varphi}} \\ \vdots & \ddots & \vdots \\ W_{n_{h}1} & \ldots & W_{n_{h}n_{\varphi}} \end{bmatrix}} & (37) \end{matrix}$ And the derivative of the parameter sensitivity matrix for the input weights ψ_(W)(I) with respect to the i-th component of the regression vector φ_(i)(I,θ) is determined as follows:

$\begin{matrix} {\frac{\mathbb{d}{\Psi_{W}(l)}}{\mathbb{d}{\varphi_{i}\left( {l,\theta} \right)}} = {{{{diag}\left( {o^{\prime}\left( {h(l)} \right)} \right)}{\overset{\sim}{\omega}\left\lbrack {0e_{i}} \right\rbrack}} + {\begin{pmatrix} {\left( {{{diag}\left( {o^{''}\left( {h(l)} \right)} \right)}\overset{\sim}{\omega}} \right) \cdot} \\ \left( {\overset{\sim}{W}{s(i)}^{T}} \right) \end{pmatrix}\left\lbrack {1{\varphi\left( {l,\theta} \right)}} \right\rbrack}}} & (38) \\ {\frac{\mathbb{d}{\Psi_{W}(l)}}{\mathbb{d}{\varphi_{i}\left( {l,\theta} \right)}} \in {\mathbb{R}}^{n_{h} \times {\lbrack{1 + n_{\varphi}}\rbrack}}} & (39) \end{matrix}$ Where, o″(h(I)) indicates a column vector, whose i-th entry is given by the second derivative of the output of the hidden layer with respect to the i-th input to the hidden layer at the k-th observation:

$\begin{matrix} {{o^{''}\left( {h(l)} \right)} = {{\left\lbrack \frac{\partial^{2}{o_{i}\left( {h_{i}(k)} \right)}}{\partial{h_{i}^{2}(k)}} \right\rbrack i} = {1\ldots\mspace{14mu} n_{h}}}} & (40) \\ {{o^{''}\left( {h(l)} \right)} = {{- 2}{{o\left( {h(l)} \right)} \cdot {o^{\prime}\left( {h(l)} \right)}}}} & (41) \end{matrix}$

Here e_(i) denotes the direction vector into the i-th component of φ(I, θ), and s(i) is again the single entry vector. e _(i)=[0 . . . 1 . . . 0]εR ^(1×n) ^(φ)   (42) s _(i)=[0 . . . 1 . . . 0]εR ^(1×n) ^(φ)   (43)

Ad(iii) For dynamic autoregressive systems the regression vector not only depends on past system inputs but also on model outputs. Therefore the derivative of φ′(I+1,θ) with respect to u_(i)(I−j) requires the calculation of the derivative of past model outputs ^(^)y(I, θ) with respect to u_(i)(I−j). Dynamic system inputs u_(i)(k) have on the one hand a direct impact on the model output and on the other hand an indirect influence via the past n model outputs. Taking this fact into account and using the chain rule the derivative of ^(^)y(I, θ) with respect to u_(i)(I−j) is given by:

$\begin{matrix} {\frac{\mathbb{d}{\hat{y}\left( {l,\theta} \right)}}{\mathbb{d}{u_{i}\left( {l - j} \right)}} = {{\frac{\partial{g\left( {{\varphi\left( {l,\theta} \right)},\theta} \right)}}{\partial{\hat{y}\left( {{l - 1},\theta} \right)}}\frac{\mathbb{d}{\hat{y}\left( {{l - 1},\theta} \right)}}{\underset{\underset{{recursive}\mspace{14mu}{calculation}}{︸}}{\mathbb{d}{u_{i}\left( {l - j} \right)}}}} + {{\ldots++}\frac{\partial{g\left( {{\varphi\left( {l,\theta} \right)},\theta} \right)}}{\partial{\hat{y}\left( {{l - n},\theta} \right)}}\frac{\mathbb{d}{\hat{y}\left( {{l - n},\theta} \right)}}{\mathbb{d}{u_{i}\left( {l - j} \right)}}} + \frac{\partial{g\left( {{\varphi\left( {l,\theta} \right)},\theta} \right)}}{\partial{u_{i}\left( {l - j} \right)}}}} & (44) \\ {{j \geq {1\mspace{14mu} k}} = {l - j}} & (45) \end{matrix}$

In the next time step the result of (44) is used for the calculation of the derivative ^(^)y(I+1, θ) with respect to u_(i)(I−j). In a recursive procedure all model outputs, which are needed for the regression vector at the I-th observation are differentiated. The derivative of g(φ(I, θ), θ) with respect to the elements of the regression vector is given in vector matrix notation by: in (44)

$\begin{matrix} {\frac{\partial{g\left( {\varphi\left( {l,\theta} \right)} \right)}}{\partial{\varphi\left( {l,\theta} \right)}} = {{{\overset{\sim}{\omega}}^{T}{{diag}\left( {o^{\prime}\left( {h(l)} \right)} \right)}\overset{\sim}{W}} \in {\mathbb{R}}^{1 \times n_{\varphi}}}} & (46) \end{matrix}$

The derivative of all system inputs uj(s), which are used in the regression vector φ(I, θ) with respect to u_(i)(k) is calculated by:

$\begin{matrix} {{{\frac{\mathbb{d}{u_{j}(s)}}{\mathbb{d}(k)} - {\delta_{ij}\delta_{ks}1}} \leq i},{j \leq {{n_{u}\mspace{14mu} l} - d - m} \leq s \leq {l - d}}} & (47) \\ {\delta_{ij} = \left\{ \begin{matrix} {1,} & {{{for}\mspace{14mu} i} = j} \\ {0,} & {{{for}\mspace{14mu} i} \neq j} \end{matrix} \right.} & (48) \end{matrix}$ Here, δij is the Kronecker-Delta-function, which is 1 for i=j and 0 for i 6=j and i, j denote the input index from 1 to n_(u).

The constrained recursive excitation signal optimization is based on the calculation of the gradient (30) with respect to all different inputs u_(i)(k) with i=1 . . . n_(u) for all observations k=1 . . . N. The compliance to input, input rate and output constraints during the optimization procedure is assured by a constrained gradient method. The principle of the method is explained in FIG. 4 for a two dimensional example. In every iteration (indexed by v) the (quadratic) difference between the gradient δ^((v))∇ij^((v)) D and the excitation signal increment Δu_(i) is minimized while simultaneously the feasible area defined by the constraint vector g≦0 is approached. The constraint vector g=[g1 . . . go]εR^(1×0) comprises all possible constraints. A constraint is active if gk=0 and inactive if gk<0. Here, δ^((v)) denotes the variable step length of the gradient method. Mathematically the problem is expressed as:

$\begin{matrix} {{\left( {{\Delta\; u_{i}} - {\delta^{(v)}{\nabla_{i}J_{D}^{(v)}}}} \right)^{T}\left( {{\Delta\; u_{i}} - {\delta^{(v)}{\nabla_{i}J_{D}^{(v)}}}} \right)}->\min\limits_{\Delta\; u_{i}}} & (49) \end{matrix}$

The linearization of the active constraints g(v)act is given by g(v)Iin, which must equal zero:

$\begin{matrix} {g_{lin}^{{(v)}^{T}} = {{\left( g_{act}^{(v)} \right)^{T} + {\left( \frac{\mathbb{d}g_{act}^{(v)}}{\mathbb{d}u_{i}} \right)^{T}\Delta\; u_{i}}} = 0}} & (50) \end{matrix}$

For the optimization with active constraints a scalar Lagrange function L with the according active multiplier row vector λ(v)act is defined.

$\begin{matrix} {\mathcal{L}^{(v)} = {\frac{1}{2}\left( {{\Delta\; u_{i}} - {\delta^{(v)}{\nabla_{i}J_{D}^{(v)}}}} \right)^{T}{\left( {{\Delta\; u_{i}} - {\delta^{(v)}{\nabla_{i}J_{D}^{(v)}}}} \right)++}\lambda_{act}^{(v)}g_{lin}^{{(v)}^{T}}}} & (51) \end{matrix}$

The extremal value of the Lagrange function is obtained where the derivative of L with respect to Δu_(i) equals zero:

$\begin{matrix} {\frac{\mathbb{d}\mathcal{L}^{(v)}}{{\mathbb{d}\Delta}\; u_{i}} = {{{\Delta\; u_{i}} - {\delta^{(v)}{\nabla_{i}J_{D}^{(v)}}} + {\frac{\mathbb{d}g_{act}^{(v)}}{\mathbb{d}u_{i}}\lambda_{act}^{{(v)}^{T}}}} = 0}} & (52) \end{matrix}$

Then the excitation signal change Δu_(i) is stated as a function of λ_(act):

$\begin{matrix} {{\Delta\; u_{i}} = {{\delta^{(v)}{\nabla_{i}J_{D}^{(v)}}} - {\frac{\mathbb{d}g_{act}^{(v)}}{\mathbb{d}u_{i}}\lambda_{act}^{{(v)}^{T}}}}} & (53) \end{matrix}$

Insertion of this result in the constraint condition (50) gives for λ_(act):

$\begin{matrix} {\lambda_{act}^{{(v)}^{T}} = {\left\lbrack {\left( \frac{\mathbb{d}g_{act}^{(v)}}{\mathbb{d}u_{i}} \right)^{T}\left( \frac{\mathbb{d}g_{act}^{(v)}}{\mathbb{d}u_{i}} \right)} \right\rbrack^{- 1} \cdot \cdot \left\lbrack {\left( g_{act}^{(v)} \right)^{T} + {\left( \frac{\mathbb{d}g_{act}^{(v)}}{\mathbb{d}u_{i}} \right)^{T}\delta^{(v)}{\nabla_{i}J_{D}^{(v)}}}} \right\rbrack}} & (54) \end{matrix}$ Using the result for λ(v)act and insertion in (53) gives the final result for the iterative excitation signal change Δu_(i).

In the following the active constraints in (50) are treated in detail for input, input rate and output limit violations. Input constraints: Exemplary, it is assumed that the input at the k-th observation exceeds the feasible region defined by [u_(min), u_(max)], then the following constraints are active: u _(i)(k)u _(max) :Δu _(i)(k)−u _(max) +u _(i) ^((v))(k)=0  (55) u _(i)(k)u _(min) :Δu _(i)(k)−u _(min) +u _(i) ^((v))(k)=0  (56)

Rate constraints: The adherence of the input rate Δu_(i rate) (k)=u_(i)(k+1)−u_(i)(k) to [Δu_(min), Δu_(max)] is expressed by the following conditions:

For Δu_(i rate) (k)>Δu_(max): [u _(i) ^((v))(k+1)+Δu _(i)(k+1)]−[u _(i) ^((v))(k)+Δu _(i)(k)]−Δu _(max)=0   (57)

For Δu_(i) _(rate) (k)<Δu_(min): −[u _(i) ^((v))(k+1)+Δu _(i)(k+1)]+[u _(i) ^((v))(k)+Δu _(i)(k)]+Δu _(min)=0   (58)

Output constraints: For the incorporation of output constraints it has to be taken account of the fact that for autoregressive systems the input u_(i)(k) influences all future model outputs ^(^)y(k+I), I>1. If an output limit violation occurs at the k-th observation, where ^(^)y(k)>ymax, then the system inputs u_(i)(I) with I_k have to be modified so that y(k) is in the feasible range, see FIG. 5. This results in the following constraints:

$\begin{matrix} {{{\hat{y}(k)} > {y_{\max}:{{{\hat{y}}^{(v)}(k)} + {\frac{\mathbb{d}{{\hat{y}}^{(v)}(k)}}{\mathbb{d}u_{i}^{{(v)}^{T}}}\Delta\; u_{i}} - y_{\max}}}} = 0} & (59) \\ {{{\hat{y}(k)} > {y_{\min}:{{- {{\hat{y}}^{(v)}(k)}} + {\frac{\mathbb{d}{{\hat{y}}^{(v)}(k)}}{\mathbb{d}u_{i}^{{(v)}^{T}}}\Delta\; u_{i}} + y_{\min}}}} = 0} & (60) \end{matrix}$

Equivalent to a model predictive control problem, the calculation of future system inputs has to be made with consideration of the development of future system outputs. This requires the calculation of the derivative of the model output with respect to the dynamic excitation signal, see equation (44).

The following example shows the effectiveness of the proposed method for model based DoE with MLP networks by means of a nonlinear dynamic process. It is shown how the determinant of the Fisher matrix is iteratively improved under the compliance to input, input rate and output constraints. As initial DoE an APRB-signal is generated, which is subsequently enhanced by the application of the presented method. The presented optimization procedure is applied to a (SISO) Wiener model. The Wiener model is described by a serial arrangement of a linear time invariant transfer function G(z−1) and a static nonlinearity NL at the system output. NL:y=arctan υ(61)

Here, the transfer function describes an oscillatory system of second order:

$\begin{matrix} {{G\left( z^{- 1} \right)} = {\frac{{0.001867z^{- 1}} + {0.01746z^{- 2}}}{1 - {1.7826z^{- 1}} + {0.8187z^{- 2}}} = \frac{V\left( z^{- 1} \right)}{U\left( z^{- 1} \right)}}} & (62) \end{matrix}$

The generation of output data is done by means of the illustrated Wiener model, as shown in FIG. 6. Then, using a standard tool, e.g. the NNSYSID Toolbox for Matlab, a reference model of the underlying process is trained. In this case, an MLP network with five neurons is already able to approximate the used Wiener model fairly well. As initial DoE for the optimization procedure a state of the art APRB-signal with 100 samples is used. In FIG. 7 the excitation signal u, the input rate Δu_(rate) and the model output ^(^)y together with the associated constraints plotted with dashed lines is depicted for the initial, the first valid and the final design. An initial design, which violates the output constraints was chosen, in order to demonstrate the functionality of the presented constrained optimization procedure. The first valid design is reached when all constraints are the first time complied with. Here, the optimization algorithm is stopped after 40 iterations and the final design is obtained. In FIG. 8, the associated iterative augmentation of the logarithm of the determinant of the Fisher matrix is depicted. The determinant of the Fisher matrix decreases in the first iteration since the output limit violation of the initial DoE has to be compensated and in the fourth iteration the first valid design is reached. The augmentation of the determinant of the Fisher matrix causes an increased excitation of the system output dynamics as shown in FIG. 7. This is a reasonable result because more information is gathered if the system is excited in the whole output range. The increase of the information content of the excitation signal is equivalent to a reduction of the parameter uncertainty of estimated model parameters.

By the present invention, a novel method for multilayer perceptron based DoE for nonlinear dynamic systems is proposed. The motivation for this work is the creation of an analytical batch procedure for the optimization of dynamic excitation signals with compliance to input, input rate and output constraints, which is required for an online DoE procedure. The effectiveness of the proposed concept for model based DoE is demonstrated on a nonlinear dynamic system by the illustration of the iterative augmentation of the determinant of the Fisher matrix, which results in a reduction of the parameter uncertainty of the estimated model parameters. The simulation example showed that the optimization of the information content of excitation signal leads to augmented system output dynamics. The presented method for excitation signal optimization also complies with input, input rate and output constraints, which is a prerequisite for an online DoE procedure. 

The invention claimed is:
 1. A machine-implemented method for obtaining data from a nonlinear dynamic real system during a test run comprising: a) providing a sequence of dynamic excitation signals for at least one measurement channel according to a current design of experiment for said test run, b) obtaining model output data by feeding said sequence of excitation signals of said current design of experiment into a model for the real system, said model comprising nonlinear dynamic models, c) determining a criterion for the information content of the complete sequence of excitation signals of said current design of experiment d) varying a totality of the sequence of excitation signals of the current design of experiment so that said criterion is improved, in order to generate a new design of experiment with a new sequence of excitation signals, e) iteratively repeating steps b) to d) for a number of iterations, each iteration proceeding according to the new sequence of excitation signals, f) obtaining, from a final iteration of the number of iterations, a final design of experiment including a final generated sequence of excitation signals; g) conducting the test run of the real system using the final generated sequence of excitation signals, and h) measuring output data from the real system for the at least one measurement channel during the test run with the final generated sequence of excitation signals.
 2. The method according to claim 1, wherein during each iteration, compliance with constraints on the excitation signals and/or the model output data is checked, whereby in case of non-compliance, the sequence of excitation signals is modified in a way that compliance is restored and the criterion is improved.
 3. The method according to claim 1, wherein after each iteration, a derivative of the criterion with respect to the dynamic excitation signal is determined, and the iterations are stopped as soon as said derivative falls beneath a predetermined value or a predetermined number of iterations is reached.
 4. The method according to claim 1, wherein for each iteration a spatial distribution of design points and a temporal progression of the excitation signal is optimized.
 5. The method according to claim 1, wherein the criterion is determined from the Fisher information matrix, in particular by calculating the trace of the inverse of said matrix, by calculating the determinant or the smallest eigenvalue.
 6. The method according to claim 1, wherein the model output data are determined with a model using Multilayer Perceptron Networks (MLP) as the nonlinear dynamic model architecture.
 7. The method according to claim 1, wherein the model output data are determined with a model using a Local Model Network (LMN) or a Takagi-Sugeno Fuzzy Model as the nonlinear dynamic model architecture. 